Skip to content

Round inputs for dense unrolled RNN tests to make pytests more stable #1284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JanFSchulte
Copy link
Contributor

We are still having problems with numerical stability of RNNs in the pytests for dense unrolled. This PR aims to fix that by implementing the rounding @jmitrevs added in #1215 for these tests as well. As I haven't been able to reproduce the failure locally, I couldn't check if this actually fixes it

Type of change

  • Bug fix (non-breaking change that fixes an issue)

Tests

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@JanFSchulte JanFSchulte added the please test Trigger testing by creating local PR branch label Apr 29, 2025
@@ -107,6 +107,7 @@ def test_resource_unrolled_rnn(rnn_layer, backend, io_type, static, reuse_factor
# Subtract 0.5 to include negative values
input_shape = (12, 8)
X = np.random.rand(50, *input_shape) - 0.5
X = np.round(X * 2**16) * 2**-16 # make it exact ap_fixed<32,16>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed ap_fixed<17, 0>. Though, as you mentioned that the issue is more prevalent with unrolled dense, is there anything compromising bit-exactness between implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be the RNN, or maybe just LSTM, in general. Jovan did this change to get the tests to behave better in the pytorch parser case. It seemed to have worked there, so I'd be in favor of merging this now so get more meaningful test results, and look into why this is such an issue later.

@JanFSchulte
Copy link
Contributor Author

This issue with the RNN precision is still causing pytests to fail in other PRs, btw. So I would still be in favor of merging this so that we don't get as many spurious test failures.

Copy link
Contributor

@vloncar vloncar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the codebase uses this, so we can continue with it until a more universal solution comes up

@vloncar vloncar merged commit 50381da into fastmachinelearning:main Jun 3, 2025
9 checks passed
@calad0i
Copy link
Contributor

calad0i commented Jun 3, 2025

As the error we are looking at is ~0.05, and this PR reduces the input error by ~1e-5, I'm a bit worried if this will actually patch the issue. I assumed that setting default in #1215 is the major reason for fixing the issue, though in this case it is already there.
Locally, the LSTM test failing is a bit spooky on my end, which did not appear (or at least not this frequent). Did you observe this to fix the issue on your end? Let's merge it if it does. Otherwise maybe we should tweak the default bw configurations a bit (e.g., more fractional bits, but 5e-2 looks already large with this configuration, so maybe something else is going on).

@JanFSchulte
Copy link
Contributor Author

I could never reproduce this issue locally, so I'm not sure if this will actually fix the tests. But a similar fix helped with the same issue for the torch RNN tests, so the hope is that it will help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants